Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Double complex convolution and attention aggregating recurrent network for speech enhancement
Bennian YU, Yongzhao ZHAN, Qirong MAO, Wenlong DONG, Honglin LIU
Journal of Computer Applications    2023, 43 (10): 3217-3224.   DOI: 10.11772/j.issn.1001-9081.2022101533
Abstract132)   HTML4)    PDF (1993KB)(80)       Save

Aiming at the problems of limited representation of spectrogram feature correlation information and unsatisfactory denoising effect in the existing speech enhancement methods, a speech enhancement method of Double Complex Convolution and Attention Aggregating Recurrent Network (DCCARN) was proposed. Firstly, a double complex convolutional network was established to encode the two-branch information of the spectrogram features after the short-time Fourier transform. Secondly, the codes in the two branches were used in the inter- and and intra-feature-block attention mechanisms respectively, and different speech feature information was re-labeled. Secondly, the long-term sequence information was processed by Long Short-Term Memory (LSTM) network, and the spectrogram features were restored and aggregated by two decoders. Finally, the target speech waveform was generated by short-time inverse Fourier transform to achieve the purpose of suppressing noise. Experiments were carried out on the public dataset VBD (Voice Bank+DMAND) and the noise added dataset TIMIT. The results show that compared with the phase-aware Deep Complex Convolution Recurrent Network (DCCRN), DCCARN has the Perceptual Evaluation of Speech Quality (PESQ) increased by 0.150 and 0.077 to 0.087 respectively. It is verified that the proposed method can capture the correlation information of spectrogram features more accurately, suppress noise more effectively, and improve speech intelligibility.

Table and Figures | Reference | Related Articles | Metrics
Research advances in disentangled representation learning
Keyang CHENG, Chunyun MENG, Wenshan WANG, Wenxi SHI, Yongzhao ZHAN
Journal of Computer Applications    2021, 41 (12): 3409-3418.   DOI: 10.11772/j.issn.1001-9081.2021060895
Abstract1090)   HTML144)    PDF (877KB)(498)       Save

The purpose of disentangled representation learning is to model the key factors that affect the form of data, so that the change of a key factor only causes the change of data on a certain feature, while the other features are not affected. It is conducive to face the challenge of machine learning in model interpretability, object generation and operation, zero-shot learning and other issues. Therefore, disentangled representation learning always be a research hotspot in the field of machine learning. Starting from the history and motives of disentangled representation learning, the research status and applications of disentangled representation learning were summarized, the invariance, reusability and other characteristics of disentangled representation learning were analyzed, and the research on the factors of variation via generative entangling, the research on the factors of variation with manifold interaction, and the research on the factors of variation using adversarial training were introduced, as well as the latest research trends such as a Variational Auto-Encoder (VAE) named β-VAE were introduced. At the same time, the typical applications of disentangled representation learning were shown, and the future research directions were prospected.

Table and Figures | Reference | Related Articles | Metrics